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Abstract 
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Introduction 

Thermodynamics-based ab initio RNA secondary structure 
algorithms are used to detect microRNAs [1], targets of 
microRNAs [2], non-coding RNA genes [3], temperature- 
dependent riboregulators [4], selenoproteins [5], ribosomal 
frameshift locations [6], RNA-protein binding sites [7], etc. The 
importance and ubiquity of RNA thermodynamics-based algo- 
rithms cannot be overemphasized - there are even applications in 
RNA design for novel cancer therapies and in synthetic biology. 
Indeed, in [8] Vashishta et al. used the RNA minimum free energy 
(MFE) structure prediction algorithm mfold [9] to design seven 
anti-pCD ribozymes, four of which were cloned, stably transfected 
in the highly metastatic human breast cancer cell line, MDA-MB- 
231, and shown to have a therapeutic potential by knocking down 
the expression of pCD. (Procathepsin D (pCD) is correlated with 
highly invasive malignancies, such as breast cancer. Ribozymes, 
first discovered by the Nobel laureats, T. Cech and S. Altman, are 
RNA enzymes that can cleave a molecule or catalyze a reaction.) 

Following pioneering work of the Tinoco Lab and Freier et al. 
[10], a number of increasingly sophisticated nearest neighbor models 
have been defined: INN [11,12], INN-HB, also called Turner99 
[13], Tumer2004 [14,15], as well as models that incorporate 
knowledge-based parameters [16,17]. These free energy param- 
eters of the nearest neighbor (NN) model form the foundation for 
essentially all current thermodynamics-based RNA algorithms: 
minimum free energy (MFE) secondary structure [9,18], Boltz- 
mann partition function [19], maximum expected accuracy 
secondary structure [20], MFE secondary structure with pseudo- 
knots [21], sampling suboptimal structures [22], RNA sequence- 
structure alignments [23], etc. 

Benchmarking studies have shown that, on average, the 
minimum free energy structure includes 73% of base pairs in X- 
ray structures when domains of fewer than 700 nucleotides (nt) are 



folded [24]; i.e. prediction sensitivity of the MFE structure is 73%, 
although accuracy drops as sequence length increases. There is 
increasing evidence that by improving the free energy parameters, 
structure prediction accuracy can be improved. Andronescu et al. 
[16] used combinatorial optimization to determine optimal 
weights oc,P for which energy parameters are determined by a- 
weighted contribution from Turner's free energies together with fi- 
weighted contribution from knowledge-based potentials, the latter 
obtained from the negative logarithm of frequencies in existent 
structure databases. Free energy parameters in the Turner model 
are determined by a least-squares fit of UV absorption data based 
on the assumption that change in heat capacity, ACp, is zero. This 
assumption is erroneous, as pointed out by Mikulecky and Feig 
[25], who observed that the hammerhead ribozyme does not fold 
in 2-state transition, but rather has 3 states: cold denatured, folded 
and hot denatured. In [17] M. Bon improved MFE structure 
prediction by defining new parameters for the nearest neighbor 
model that account for linear dependence of change ACp of heat 
capacity on sequence length and by incorporating knowledge- 
based potentials from a hand-curated selection of SprinzPs transfer 
RNA database [26]. 

Subsection 1.1: Motivation from protein helix-coil 
transition 

Consider a coarse-grain classification of amino acids, where a 
polypeptide chain is given by an n-mer, or length n sequence 
a\, . . . ,a„ of amino acids, where each residue «,■ is either in an H 
(a-helix) or C (coil) conformation. Assume that the energy of an a- 
helical residue is E(H) = £n < 0, while that of a coil residue is 
E(C) = 0. A protein with many residues in an a-helical confor- 
mation at room temperature, such as hemoglobin, will unfold into 
a random coil at a higher temperature, where all previous H 
residues have been transformed into C residues. In particular, if 
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Figure 1. Graph of the expected number of base pairs as a function of temperature for signal recognition particle with Rfam [56] 
accession number X12643. Temperature in degrees Celsius is given on the .Y-axis, while the expected number of base pairs (base pairs/n>, 
normalized by sequence length n, is given on the y-axis. Note the linear dependence on temperature for the non-cooperative Nussinov energy 
model, in contrast to the sigmoidal dependence on temperature for the cooperative Turner energy model. Data and figure taken from our paper [57]. 
doi:10.1371/journal.pone.0085412.g001 



a\, . . . ,a n is an ct-helix, then at low temperature, all residues are H, 
while at high temperature all residues are C. The partition 
function Z of a,\, . . . ,a„ is defined by Z= J2 S ex P( — E( & )/RT), 
where the sum is taken over all 2" many sequences s of H's and 
C's. Using the (temperature-dependent) partition function, we can 
compute the expected number (//) of a-helical residues for the ri- 
mer a\,...,a n at absolute temperature T, defined by 



<H>- 



dlnZ 
3 In s 



where s = exp( — cq/RT) - see [27]. Subsequently, it is possible to 



plot the expected helical fraction 



as a function of temperature. 



Non-cooperative energy models show an approximately linear 
relation, where the expected helical fraction slowly decreases as 
temperature increases. In contrast, the plot of expected helical 
fraction versus temperature for cooperative energy models displays a 
sigmoidal shape, where there is an abrupt helix-coil transition from 
high to low values for the helical fraction that occurs at a critical 
temperature Tm- 

Polymer theory provides several mathematical models to 
explain the temperature-dependent helix-coil transition for proteins. 
The simplest polymer model for the helix-coil transition of an a- 
helix is the non-cooperative model, where the probability that the 
each residue is H is independent of the conformation of every 
other residue. The cooperative, nearest-neighbor model for helix- 
coil transition, introduced by Zimm and Bragg [28], includes 
nucleation free energy <5>0 that is applied for each a-helical 
segment of contiguous H residues. Finally, the Ising model was 



introduced by E. Ising in 1925 [29] to explain ferromagnetism, but 
has subsequently been used to model protein temperature- 
dependent helix-coil transitions - see, for instance [30]. Progress- 
ing from the independent model to the Zimm-Bragg model to the 
Ising model, each model is increasingly cooperative, thus 
providing a better fit to the experimental data. See Dill and 
Bromberg [27] for a more detailed discussion. 

In the Nussinov energy model [31] for RNA secondary 
structure, the free energy of a secondary structure S is defined 
to be — 1 times the number \S\ of base pairs of S; i.e. in the 
Nussinov model, each base pair contributes an energy of — 1 , and 
there is no energy term for entropic considerations. The Turner 
energy model [13,32] for RNA secondary structure contains 
negative free energies for base stacks, which depend on the 
nucleotides involved, such as the base stacking free energy of 

5'-AC-3' 

-2.24 kcal/mol at 37°C for and of -3.26 kcal/ 

3'-UG-5' 

5'-CC-3' 

mol for . Additionally the Turner energy model 

3'-GG-5'. 7 67 

contains free energies for various loops (hairpin, bulge, internal 
loop, multiloop) that include entropic considerations. Clearly, the 
Turner energy model for RNA secondary structure is analogous to 
the cooperative, nearest-neighbor model for helix-coil transitions 
introduced by Zimm and Bragg. Figure 1 contrasts the temper- 
ature-dependent cooperativity of the Turner energy model with 
the temperature-independent non-cooperativity of the Nussinov 
energy model. The motivation for this paper is to solve the 

? Ising 

equation: — = — . Though we do not determine 

Turner Zimm — Bragg 
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Table 1. Values of sensitivity and positive predictive value (ppv) for RNAfold and RNAenn with respect to various RNA families. 



RNA family 


RNAfold -d 0 




RNAenn (Turner99) 


RNAenn (Turner04) 


sens 


ppv 




ppv 




ppv 


16s 


0.3940 


0.3326 


0.3779 


0.3153 


0.3099 


0.2674 


23sd 


0.5974 


0.5311 


0.5527 


0.4813 


0.4409 


0.4003 


23s 


0.4685 


0.3972 


0.4453 


0.3738 


0.3516 


0.3061 


5s 


0.7575 


0.6606 


0.7319 


0.6366 


0.5713 


0.5093 


ec 


0.5869 


0.5314 


0.6184 


0.5519 


0.5338 


0.4982 


grpl 


0.6625 


0.5832 


0.5047 


0.4650 


0.4837 


0.4589 


grplii 


0.6616 


0.6409 


0.6084 


0.5976 


0.4555 


0.4334 


rnapl 


0.4051 


0.3813 


0.3514 


0.3221 


0.3154 


0.3000 


rnap2 


0.4241 


0.4046 


0.4935 


0.4648 


0.3285 


0.3187 


short 


0.4048 


0.3400 


0.3690 


0.3298 


0.3155 


0.2760 


srp 


0.7228 


0.5632 


0.6286 


0.4897 


0.5677 


0.4544 


telomerase 


0.4285 


0.3074 


0.3417 


0.2404 


0.3605 


0.2662 


tmRNA 


0.2248 


0.1958 


0.1911 


0.1622 


0.1526 


0.1326 


trna2 


0.4960 


0.4697 


0.5344 


0.5005 


0.3962 


0.3828 


avg 


0.5213 


0.4575 


0.4866 


0.4279 


0.4020 


0.3678 



Sensitivity is the ratio of number of correctly predicted base pairs divided by the number of base pairs in the native structure; positive predictive value is the ratio of the 
number of correctly predicted base pairs divided by the number of base pairs in the predicted structure. Since RNAenn currently does not include energy contributions 
for dangles (single stranded, stacked nucleotides), RNAfold was used without dangles (version 1.8.5 with -d 0flag). To our knowledge, there has not been a careful 
benchmarking of structure prediction accuracy between the Turner 1 999 energy model and the newer Turner 2004 energy model, though it is interesting to note that 
RNAenn has better structure prediction when using Turner 1 999 for base stacking. Overall, it is clear that RNAfold outperforms RNAenn (Turner99), although a few cases, 
such as ec and rnap2 RNAenn have better sensitivity. Nevertheless, we expect much better performance in the future when our triplet and base stacking energy terms 
have been refined by using knowledge-base potentials. The database of RNA structures in this benchmarking set comes from a data collection of D.H. Mathews 
(personal communication}, which derives from published databases [26,54], etc. See [55] for a citation of original data sources. 
doi:1 0.1 371 /journal.pone.008541 2.t001 



the analogue of the Ising model for RNA secondary structure 
formation, we do introduce an extended nearest-neighbor model, also 
called triplet or next-nearest-neighbor model, which displays somewhat 
more cooperativity, as displayed in the sharpness of the transition 
from folded to unfolded state in a figure shown later in the paper. 

Subsection 1.2: Triplet model 

As previously mentioned, the nearest-neighbor energy model 
[13,32] assigns free energies for base stacks of the form 
5'-AB-3' 

for the formation of a stacked base pair between 
3'-DC-5' v 

5' — AB — 3' with 5' — CD — 3'. In contrast, the extended-nearest- 
neighbor triplet model assigns free energies for triplexes of the 
5' -ABC -3' 

form where a stacked triple (two contiguous base 

3'-FED-5' 

stacks) between 5' —ABC— 3' and 5' — DEF — 3'. In this case, we 

5' -ABC -3' 

expect that the triplet free energy of can be 

3' -FED -5' 

approximated by the average of the base stacking free energies for 

5'-AB-3' 5'-BC-3' 

and ; however, we expect the triplet 

3'-FE-5' 3'-ED-5' 

energies to more accurately model the formation of secondary 
RNA structure. 

The extended-nearest-neighbor triplet model for hybridized 
DNA duplexes and DNA-RNA hybrids was considered in 
experimental work of D.M. Gray, who in Table 1 of [12] 
determined the theoretical number of independent hybridized 
sequences that must be considered in UV absorbance experiments, 
in order to obtain triplet stacking free energies by least-squares 



fitting of data. In [33] Gray et al. experimentally determined in vivo 
inhibition parameters for next-nearest-neighbor triplets in the case 
of antisense DNA - RNA hybridization to inhibit protein 
expression. In [34], Najafabadi et al. applied a neural network 
to predict the thermodynamic parameters for the next-nearest- 
neighbor triplet model, using existent UV absorbance data from 
the thermodynamic database for nucleic acids, NTDB version 2.0 
[35]. 

Though at present there are no experimentally determined free 
energies for triplet stacking, Binder et al. [36] did show a strong 
correlation between microarray fluorescence intensities and DNA- 
RNA base stacking free energies of Sugimoto et al. [37]. More 
precisely, Binder et al. showed that linear combinations of triple- 
averaged probe sensitivities provide nearest-neighbor sensitivity 
terms, that rank in similar order as the base stacking free energy 
parameters for DNA-RNA in solution [37]. It is our hope that 
future improvements in RNAseq, microarray or other technologies 
will ultimately furnish experimentally determined triplet and even 
/:-tuple stacking free energies. New triplet free energies could 
immediately be incorporated into our algorithms, and it is tedious, 
but clear how one can modify our algorithms to handle /:-tuple 
free energies. 

Subsection 1.3: Plan of the paper 

In this paper, we describe the first algorithms to compute the 
partition function and minimum free energy structure for single- 
stranded RNA, with respect to the full next-nearest-neighbor triplet 
energy model for RNA. In the Introduction, we gave the motivation 
for this work, coming from the Zimm-Bragg and Ising models in 
biopolymer theory. The plan for the remainder of the paper is as 
follows. In the Results section, Section 2.1 gives the notation and 
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definitions needed for the sequel, while Section 2.2 presents the 
extended nearest neighbor model and method used to obtain 
energy parameters. In the Discussion, we give secondary structure 
benchmarking results for the nearest neighbor (NN) and extended 
nearest neighbor (ENN) energy models. Additionally, the co- 
operativity of folding is compared with both energy models. In the 
Methods section, Section 3.1 [resp. Section 3.2] presents recursions 
for the partition function [resp. minimum free energy structure] 
computation. In addition to the software RNAnn and RNAenn 
developed for this paper, we use Vienna RNA Package RNAfold 
[18], RNAstructure [38], and mfold [9]. As illustration for the 
cooperativity of folding, we compare melting curves for two small 
nucleolar RNAs (snoRNA), with respect to the NN and ENN 
energy models; additional melting curves are available on the web 
server http://bioinformatics.bc.edu/clotelab/RNAenn/. These 
results suggest that the the extended nearest-neighbor energy 
model may lead to more cooperative folding than does the nearest- 
neighbor model, which was our motivation to study the ENN 
energy model. 

The goal of this paper is to describe the non-trivial RNAenn 
algorithms, which are implemented in C/C++. Our work points 
toward a future potential improvement in RNA secondary 
structure prediction, either by incorporating triplet knowledge- 
based potentials or experimentally inferred extended nearest- 
neighbor free energy parameters. 

Results: Extended nearest neighbor model 
algorithms 

Assume that a\, . . . ,a„ is a given RNA sequence. In this section, 
we describe pseudocode for the partition function and minimum 
free energy computation for an extended nearest neighbor model. 
Although our software, RNAenn, does depend on the exact values 
of the extended nearest-neighbor energy parameters, the descrip- 
tion of the algorithms does not. 



Hairpin 

r --Q-G 

U 

Stack *i 35 37 

rti 

Multiloop A U " A 



29 



Bulge 

U 



u 



42 n-G, 

u -ir - A .1/ 

\ rA, C 49 



54 



4. G-C 58 

r> C A 



internal 
loop 



i 



A 
■ A 



A 
U ■ 



61 



Figure 2. Depiction of elements of RNA secondary structure for 
which experimentally determined free energy parameters are 
available. In this 61 nt RNA, the hairpin loop closed by base pair 
between nucleotides at position 43 and 48 is known as a tetraloop, or 
hairpin loop of size 4. Similarly, the hairpin loop of size 7 is closed by a 
base pair between nucleotides at positions 17 and 25. Free energy 
parameters for bulges and internal loops (two-sided bulges, not shown 
in the figure) are available, while an affine approximation is used for the 
free energy of a multiloop or junction. 
doi:1 0.1 371 /journal.pone.008541 2.g002 



Subsection 2.1: Notation and definitions 

Let a = ci\, . . . ,a n be an arbitrary RNA sequence, and let a[ij] 
denote the subsequence a,, . . . ,cij. A secondary structure S for a given 
RNA sequence a = a\, . . . ,a„ is a set of base pairs (ij), \<i<j<n, 
such that (1) a,,fl/ forms a Watson Crick AU, UA, GC, CG or 
wobble GU, UG pair; (2) each base is paired to at most one other 
base, i.e. (ij),(i,k)eS implies that j = k, and (ij),(kj)eS implies 
that i = k; (3) there are no pseudoknots in S, where a pseudoknot 
consists of base pairs (ij),(k,£) where i<k<j<£; (4) each hairpin 
loop has at least 8 unpaired bases; i.e. (iJ)eS implies that 

j-i>e+i. 

In software such as mfold [9], Unafold [39], RNAfold [40], and 
RNAstructure [38], the parameter 0, denoting the minimum 
number of unpaired bases in a hairpin loop, is taken to be equal to 
3, due to steric constraints of RNA molecules. 

The nearest-neighbor and extended nearest-neighbor triplet 
models are additive energy models that entail free energy values 
for loops, as explained in [41]. A hairpin in a secondary structure S 
is defined by the base pair (ij), where i+ 1, . . . j— 1 are unpaired. 
A left bulge in S is defined by the two base pairs (ij),(kj— l)eS, 
where i + 1 < k and i+ 1, . . . ,k — 1 are unpaired. A right bulge in S 
is defined by the two base pairs (ij),(i+l,k)eS, where k<j— 1 
and k + 1 , . . . j — 1 are unpaired. An internal loop in S is defined by 
the two base pairs (ij),(k,£)eS, where i+ 1 <k and l<j— \ and 
! + 1, . . . ,k— 1 and £+l,...j—l are unpaired. Finally, a k-way 
junction, or multiloop with k— 1 components, is defined by the 
closing base pair (ij) and k— 1 inner base pairs 
(x\,y\), ■ ■ .,(x k -uyk-\), where i<x l <y l <x 2 <y 2 < ■ ■ ■ <x k _ x 
<yk-l<j, and the nucleotides in intervals [i+ l,X\ — 1], 
[y\ + l,X2 — 1], . • . ,\yk-i + are all unpaired. See Figure 2 

for an illustration. 

Given the RNA nucleotide sequence a\,...,a„, we use the 
notation H to denote the free energy of a hairpin, E(S) to denote 
the free energy of a stacked base pair, E(i) to denote the free 
energy of an internal loop, E(B) to denote the free energy of a 
bulge, while the free energy ii(M]) for a multiloop containing N/, 
base pairs and N u unpaired bases is given by the affine 
approximation a + bNb + cN u - The free energy £(M1) of a 
multiloop having exactiy one component is then given by 
a + b + cN„. 

For RNA sequence a\,. . . ,a n , for all 1 <i<j<n, the partition 
function Zy is defined by JZs e^ E< - s ^ RT , where the sum is taken 
over all secondary structures S of a[ij\, E(S) is the free energy of 
secondary structure S, R is the universal gas constant with value 
R = 0.001987 kcal/mol -1 K - , and T is absolute temperature. In 
the Zuker [9,18,38] and McCaskill [19] algorithms, E(S) is the 
Turner nearest neighbor energy model; in contrast, when 
discussing the extended nearest-neighbor energy model, we use 
E(S) to denote the triplet energy model. 

Given an RNA sequence a\, .. . ,a„, in order to compute the 
partition function Z\ t „ [resp. minimum free energy E\ iH \ for 
a\,. . . ,a„, we need inductively to determine the partition function 
Zjj [resp. minimum free energy j] for all smaller subsequences 
a, , . . . ,Oj. In so doing, we need to know which structures involve a 
triple stack (ij),(i+lj—Y),(i+2j—2), which structures involve 
only a stacked pair (ij),(i+\j— 1), and which structures involve a 
base pair (ij) which closes a loop region. This is accomplished by 
terms ZBBij [resp. EBB, J]. Moreover, the Turner energy model 
stipulates that a base pair (ij), which closes a left bulge of size 1 , as 
in (ij),(i+2j— 1), or a right bulge of size 1, as in (ij),(i+ 1 j— 2), 
is considered to stack on the subsequent base pair. This 
consideration requires the introduction of special terms ZBBL, j, 
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ZBBRjj [resp. EBBL, h EBBRjj] . With that, we have the 
following definition. 

Definition 1 (Energies and partition function for triplet 
loop model) 

• Eij+i-j— ij denotes the base stacking free energy from the MN model, 
while £/,[+i,i+2;/-2,;-l,/ denotes the triplet stackingfree energy from the 
ENN model. 

• Zjj: partition Junction over all secondary structures of a[ij]. 

• ZBj f. partition function over all secondary structures of o\ij], which 
contain the base pair {ij). 

• ZBBj f. partition function over all secondary structures of a[ij], which 
contain the base pairs 1 J — 1). 

• ZBBLj j\ partition function over all secondary structures of a[ij], which 
contain the base pairs (ij),(i+2j— 1). 

• ZBBRjj: partition function over all seconday structures of a[ij], which 
contain the base pairs lj — 2). 

• ZMjf. partition function over all secondary structures of a[ij], subject to 
the constraint that a[ij] is part of a multiloop and has at least one 
component. 

• ZM\j j \ partition function over all secondary structures of a[ij], subject 
to the constraint that a[ij] is part of a multiloop and has at exactly one 
component. Moreover, it is required that i base-pair in the interval [ij] ; 
i.e. (i,r) is a base pair, for some i<r<j. 

• Ejj: minimum free energy over all secondary structures of a[ij\. 

• EBj f. minimum free energy over all secondary structures of a[ij], which 
contain the base pair (ij). 

• EBB, j : minimum free energy over all secondary structures of a [ij] , which 
contain the base pairs (ij),(i-\- ij — 1). 

• EBBLjf. minimum fee energy over all secondary structures of a[ij], 
which contain the base pairs (ij),(i + 2j — 1). 

• EBBR^: minimum free energy over all secondary structures of a[ij], 
which contain the base pairs (ij),(i+lj — 2). 



• EMjj: minimum free energy over all secondary structures of a[ij] , subject 
to the constraint that a[ij] is part of a multiloop and has at least one 
component. 

• EM\if. minimum free energy over all secondary structures of a[ij], 
subject to the constraint that a[ij] is part of a multiloop and has at 
exactly one component. Moreover, it is required that i base-pair in the 
interval [ij]', i.e. (i,r) is a base pair, for some i<r<j. 

Details for the recursions necessary to compute the ENN 
minimum free energy secondary structure and ENN partition 
function are given in the Methods section. 

Subsection 2.2: Extended nearest-neighbor energy 
model ENN-13 

Here we describe details for the extended nearest-neighbor 
energy model parameters, which we denote by ENN-13, since our 
code RNAenn was completed in 2013. 

Though some related experimental work has been done, 
especially by D.M. Gray and co-workers [11,12,33], there are 
no available experimentally determined parameters for triplet 
stacking. Rather than using the triplet stacking free energy 
parameters INN-48 [34], which are incomplete since GU-wobble 
pairs were not included, we instead infer RNA triplet stacking free 
energies by a novel use of Brown's algorithm [42], which computes 
the maximum entropy joint probability distribution that is consistent 
with given user-specified marginal probabilities. Though Brown's 
algorithm has been used by C. Burge to predict intron-exon splice 
sites in the human gene finder, GenScan [43] , this appears to be the 
first use of Brown's algorithm to infer free energy parameters. 

Brown's algorithm for maximum entropy joint 
distribution. In [42], D.T. Brown described an efficient 
algorithm to compute the maximum entropy joint probability 
distribution given certain marginal probabilities, where we recall 
that the entropy of a joint probability distribution p : £I"->[0,1] is 
defined by H(p)= - J2 Xl ,...,x^nP( x ^ ■ ■ ■ >*»)' in P( x u- ■ ■ ,x„). Al- 
though the algorithm was correct, there was an error in Brown's 



Algorithm 2 (Brown's algorithm [44]) 

INPUT: Finite sample space Q, integer n > 1, t > 0. set of arbitrary (target) marginal probabilities 
Po„ o lm 

OUPUT: Maximum entropy joint probability distribution p : 12" — [0,1] having given marginals (i.e. 
within £ of target marginals). 

k = im- 

for all xi,. . . ,x„ € n" 

p(xi 

# initialize p to uniform distribution 
repeat 

for each marginal ?„,,,...,„,„, 

A/'=p 0<l „, m # A/' is target marginal 

** = «.)€n».x„ =.„ x, m =.,„, P(xi , . . . , x„) 

* M = marginal of p 

for all Tt ,x„ 6 fi" 

if x (1 =o il ,...,x,„, =a,„ 

p(xi,...,x„) =p{xi,...,x„)-Tr 
S = max{ [marginal of p-target|} 
#where max is taken over all marginal/target pairs 
until 5 < t 

Figure 3. Pseudocode for Brown's algorithm. 

doi:1 0.1 371 /journal.pone.008541 2.g003 
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Figure 4. Secondary structure of the XPT guanine riboswitch from Bacillus subtilis, with experimentally determined 148 nt sequence 
CACUCAUAUA AUCGCGUGGA UAUGGCACGC AAGUUUCUAC CGGGCACCGU AAAUGUCCGA CUAUGGGUGA GCAAUGGAAC 
CGCACGUGUA CGGUUUUUUG UGAUAUCAGC AUUGCUUGCU CUUUAUUUGA GCGGGCAAUG CUUUUUUU taken from Wakeman 

et al. [58]. (Left) Gene off structure, determined by in-line probing - see [59] for X-ray structure of aptamer, which is consistent with the secondary 
structure. (Center) Minimum free energy (MFE) structure, determined by RNAnn, our implementation of the nearest-neighbor energy model. This 
structure is identical to the MFE structures computed by Vienna RNA Package RNAfold [18], RNAstructure [38], and mfold [9]. (Right) Minimum free 
energy (MFE) structure, determined by RNAenn, our implementation of the extended nearest-neighbor energy model. The only difference with the 
nearest-neighbor MFE structure lies in two missing GU base pairs (116,134), (117,133) indicated by a circle. 
doi:1 0.1 371 /journal.pone.008541 2.g004 



proof of correctness, which was subsequently repaired by Ireland 
and Kullback [44], who additionally showed that the maximum 
entropy distribution is not the maximum likelihood distribution. 

Suppose that p(x\ , . . . ,x„) is a given joint probability distribu- 
tion on Q", where Q, is the alphabet {A,C,G,U} of RNA 
nucleotides. Recall that a marginal probability distribution 
Pa, ,...,a im '■ Q"~ m ->[0,1] is defined by the projection 



leftMargProb(J$,y) -- 



Y^s stack(<S,/?) + stackG8,y) 



midMargProb(cc,y) = 



J2s stack(a,cS) + stack((5,y) 



E 

(x x ,...,x n )eC^,x ix =o f j ,...,x im =a im 



p(x\,...,x„) 



right MargProb(a,P) - 



J2 d stack(o:,/?) + stack(jS,<$) 



Given an integer n>2, a value e>0, and a set of arbitrary 
marginal probabilities p ail ,..., a , m the idea is to initialize p to the 
uniform distribution, then repeatedly update p so that it has the 
correct currendy considered marginal. 

Algorithm 1 (Brown's algorithm [42]) Input: Finite sample 
space Q,, integer n~>2, e>0, set of arbitrary (target) marginal probabilities 
Pa, v ...,a im - Ouput: Maximum entropy joint probability distribution 
p : fi"->[0, 1] having given marginals (i.e. within e of target marginals). 

Idea: 

initialize p to the uniform distribution 
repeat 

for each target marginal M' 

compute current marginal M of p 

until p has all the desired marginals 
See Figure 3 for more detailed pseudocode of Brown's 
algorithm. 

Conversion between free energies and probabilities. To 

compute triplet stacking free energies, base stacking free energies 
from Turner 1999 [or alternatively Turner 2004] energy model 
are converted to marginal probabilities in the following manner. 
Given a triplet stack 

5' — Xi X2X3 — 3' 
3'-Y,Y 2 Y3-5' 

where the outermost base pair occurs at (ij), let a denote the 
outermost base pair (ij) with nucleotides X\,Y\, let f! denote the 
middle base pair (i+lj — 1) with nucleotides Xi,Yi t and let y 
denote the innermost base pair (i + 2j — 2) with nucleotides 
X3 , F3 . It is a well-known principle, first proved by Jaynes [45] and 
subsequently exploited in protein threading algorithms [46,47], 
that a representative database of biomolecular sequences and 
structures has the property that motif occurrences are Boltzmann 
distributed - i.e. motif frequencies are of the form exp( E(moht)/ RT) ^ 
where the partition function Q is the sum of Boltzmann factors of 
all motifs. For this reason, we define the left, middle and right marginal 
probabilities of stacked base pairs by: 



where 8 ranges over the six base pairs GC, CG, AU, UA, GU, 
UG, stack(a,/?) denotes base stacking free energies from the 
Turner 1999 model [or alternatively Turner 2004 model], and the 
partition function 



Q = ^2 stack(a,/?) + stack(j?,y). 



In words, the left/middle/right marginal probability is defined 
by the quotient of the sum over all six base pairs in the left/ 
middle/right position, while fixing the remaining two base pairs, 
divided by the partition function. We then apply Brown's 
algorithm to compute the joint probability distribution P(a.,f5,<>) 
for all base pairs 0£,j6,<5, and thus obtain triplet stacking free 
energies 



E(<x,p,y)= -RT\n(QP(a&y)). 



(2) 



An additional potential advantage of the extended nearest 
neighbor energy model is that the MFE structure is perhaps less 
likely to have isolated base pairs, than when using base stacking 
free energies. In particular, Bompfunewerer et al. [48] described 
an 0(« 3 ) algorithm to compute the MFE structure and partition 
function over all canonical secondary structures; i.e. those having no 
isolated base pairs, where an isolated base pair (ij) has no adjacent 
base pair (i+ Ij— 1) or (i— lj+ 1). Bompfunewerer et al. stated 
that preliminary studies indicated that canonical MFE structure 
prediction is both faster and more accurate. In [49] we provided 
theoretical reasons for the computational speed-up, by using 
complex analysis to prove that the asymptotic number of canonical 
secondary structures is 2.1614-w~ 3 / 2 -1.96798", compared to the 
much larger number 1.104366-«- 3 / 2 -2. 618034" of all secondary 
structures, a result obtained in [50] by a different method. 

Apart from the triplet stacking energy, ENN-13 contains free 
energies for base stacks (used only at stem ends), hairpins, bulges, 
internal loops and multiloops from the Turner NN model - here, 
the user may choose between the Turner 1 999 parameters and the 
Turner 2004 parameters, the former taken from Vienna RNA 
Package 1.8.5 and the latter taken from the Nearest Neighbor 
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Figure 5. Melting curves for two small nucleolar RNAs (snoRNA) from family RF00158 from Rfam version 9.0 [56]. For each RNA 
sequence, over a range of temperatures, temperature-dependent base pair probabilities were computed using four different software packages: 
RNAenn, RNAnn, version 1.8.5 of RNAfold [40] and RNAstructure [38]. The software RNAenn (RNA extended nearest-neighbor) is our implementation 
of the algorithms described in this paper, while the software RNAnn (RNA nearest-neighbor) is our implementation of the following algorithms: 
Zuker's minimum free energy structure algorithm [51], McCaskill's partition function algorithm [19], and the Ding-Lawrence sampling algorithm [22]. 
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Each algorithm was run without dangle or coaxial free energies. At each temperature T, for each algorithm, the expected number {BP} of base pairs 
was computed as (BP} = Y.\<i<j<„P'f' f° r eacn algorithm, the collection of such (J,<SP» points generates a melting profile obtained by that 
algorithm. (Left) Melting curves for the 72 nt small nucleolar RNA (snoRNA) from Ornithorhynchus anatinus (platypus) with GenBank accession code 
AAPN01359272.1/4977-5048 and sequence given by AGCACAAAUG AUGAGCCUAA AGGGACUUAA UACUGAAACC UGAUGUAACU AAAUAAUAUA 
UGCUGAUCGU GC (Right) Melting curves for the 69 nt small nucleolar RNA (snoRNA) from Otolemur garnetti (small-eared galago) with GenBank 
accession code AQR01 1 79445.1/1047-1 1 15 and sequence given by GGCACAAAUG AUGAAUGACA AGGGACUUAA UACUGAAACC UGAUGUUACA 
UUACAAUGUG CUGAUGUGC. 
doi:1 0.1 371 /journal.pone.008541 2.g005 



Data Base (NNDB) http://ma.urmc.rochester.edu/NNDB/ [15]. 
For readers interested in the exact nature of the NN energy 
parameters, we recommend the excellent overview by Zuker et al. 
[58]. 

Discussion 

There are some deviations between the MFE structure 
computed for the ENN model, compared with the nearest- 
neighbor (NN) model. In particular, Figure 4 shows the secondary 
structure for the XPT riboswitch from Bacillus subtilis, obtained by 
experimental in-line probing (left panel), minimum free energy 
structure computation for the NN model (middle panel) and 
minimum free energy structure computation for the ENN model 
(right panel). The MFE structure for the NN model was identical, 
using four different software packages: mfold [9], RNAfold [40], 
RNAstructure [38] and our own program RNAnn for the nearest- 
neighbor model. Our software RNAenn for the ENN model differs 
from the NN minimum free energy structure, only by missing two 



GU-wobble base pairs at positions (116,134), (117,133). Adjacent 
wobble pairs are energetically weak, so we do not view this as a 
failure of our software, but rather the need for additional scrutiny 
of the ENN energy parameters. Specifically, in the future, we 
intend to include a dependence on the heat capacity ACp as 
proposed by M. Bon [17], and knowledge-based potentials 
[16,17]. By such energy re-parametrization, we expect to improve 
the sensitivity values reported in Table 1 . 

Our next-nearest-neighbor triplet energy model appears to lead 
to somewhat more cooperative folding than does the nearest 
neighbor energy model, as indicated by sharper sigmoidal 
transition in the melting curves obtained by RNAenn, compared 
to melting curves obtained by RNAfold and RNAstructure - see 
Figure 5. Here, melting curves were computed in the following 
manner. For each RNA sequence, over a range of temperatures, 
temperature-dependent base pair probabilities were computed. At 
each temperature T, for each algorithm, the expected number 
{BP} of base pairs was computed by (BP} = Yll<i<j<nP'j- F° r 






ZM 



Z I ZB I Z 

m \ 4 + 9 



i r j l 
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ZH 



j i u v j i i+1 r-1 r j-1 j i 



j ' 



j 



j i r-l r j 






i i+1 r-l r j-1 j 

Figure 6. Feynman diagram to pictorially describe recursions described in this proposal for partition function with respect to 
extended nearest neighbor model. For simplicity, this diagram depicts ZBB, but not ZBBL,ZBBR, which correspond to a special treatment for 
particular left/right bulges of size 1, that are treated as stacked base pairs. 
doi:1 0.1 371 /journal.pone.008541 2.g006 
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each algorithm, the collection of all points with (x,y) coordinates 
given by (T,(BP~)) generates a melting profile. 

Methods 

The top level recursion in the computation of the partition 
function [resp. minimum free energy structure] is identical to that 
of McCaskill's algorithm [19] [resp. Zuker's algorithm [51]]. The 
technical difficulty lies in a kind of "2-look-ahead" strategy, to 
determine if a base pair (ij) not only stacks onto the adjacent base 
pair (i+lj—l), but the latter also stacks onto the base pair 
(i + 2j — 2). This leads to technical issues, including a special 
treatment for bulges of size 1, since these are considered to stack 
on the following base pair. 

Subsection 3.1: Partition function algorithm 

This section presents the recursions to compute the partition 
function in the extended nearest-neighbor energy model. Figure 6 
depicts the recursions as a Feynman diagram. (For simplicity, the 
Feynman diagram in Figure 6 depicts ZBB, but not 
ZBBL,ZBBR, which correspond to a special treatment for 
particular left/ right bulges of size 1 , that are treated similarly to 
stacked base pairs.) The unconstrained partition function Zy for 
a,-, . . . ,aj is defined below. Note that the recursions for Zy(D) 
entail a maximum internal loop of size 30, which follows the 
Vienna convention to reduce run time of the algorithm to 
however, our implementation actually uses the more complicated 
treatment of Lyngso et al. [52], which ensures a cubic run time 
while not arbitrarily bounding the maximum size of internal loops. 
A similar remark applies to the treatment of internal loops and 
bulges of size 1 in Section 3.2. 



ZiJ- 



'1 ifj-i<t 

j-e-i 

Zjj-i+ZBjj+ J2 Zij^yZB k j else 

k=i+l 



We now in turn describe the partition functions ZMIq for a 
multiloop having a single component, ZMij for a multiloop 
having one or more components, ZBjj where (ij) pair together, 
and for ZBBij where (ij) and (i+lj—l) pair together. 



Z ij (S) = ZBB ij 



Z, v (H) = exp - 



H(J-i-l,T) 
~RT 



Z, ; (LB)=cxp( - B( hp\zBBL, J + 



2^ ex P( " 

jfc=;+3 



U(k-i-\,T) 
RT 



■ZBkj-i 



Z, v (RB) = exp 



l,T) 



RT 



■ZBBRj ; + 



W / B(j-k-l,T)\ 

2^ eX P ( ^ ) ZB '+ 

k = i + 0 + 2 V 



RT 



l-i-l<1j-r-\+l-i-2<X 



RT 



Z, v (M) = exp - 



a + 2b + TMM(ij, i+lj-1) 
RT 



j-e-2 

ZM i+uk - V ZMhj-i 

jt = i+fl + 3 



ZBB ij = 

0 if/-/<0 + 2 

2y(S) + fiyOH) + Qij(B) + QuQ) + g v (M) else 

where 



ZMhj- 



ZM iJ = 

( 0 

j-e-i 



0 

j 

E ex P 

k=i+e+i 



if j-i<t 
[-^)-ZB hk else 



E exp(-*±^)ZMl y 

k-i 



j-6-2 

E 

k-i 



+ E exp(- ±j)-ZMiyZM\ k + u 



0 



if i < j and j — i<t 



else 



ify-!<( 



Z y (S) + Z W (H) + Z y (B) + Zy(D) + Zij(M) else 



e v (S) = exp - 



E(ij+lj+2;j-2j-lj) 
RT 



ZBBi+u-i 



e, v (H) = exp 



e, v (LB) = exp 



E(iJ+l;j-\j) + H(j-i-3,T ) 
RT 



E(ij+lj+3-J-2j-lj) + B(l,T) 
RT 



ZBBL i+ \j_\ + 

y ^ 3 / E(ij+l;j-lj) + B(k-i-2,T) 
2^ ex P 

k = i+4 \ 



where 
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= exp - 



E(i,i+U + 2J-3J-lJ) + B(l,T] 



RT 



ZBBR i+l j_i + 

& ( E(i,i+l;j-lj) + B(j-k-2,T) 

k=i+d+3 
ZB i+ 2,k 



RT 



j-e-4 j-3 

zZ zZ ex p 

ZBtr 



m + 1 ;j - 1 J) +!((< - i - 2) +(j - r - 2)) \ 
RT J 



e, i/ (M) = exp 



E(i,i+l;j-lJ) + a + 2b 
RT 



j-6-3 

ZM i+1M -yZM\ kJ - 

k = i+6+A 



QL,j{V> = 
j-e-4 j-3 
zZ Yl ex P 

t=i+4r=e+0+l 

ZB (r 



fiLy(M) = exp 



E(i,i+ 2-j - 1 J) + H((l - i ~ 3) + (j - r - 2 )) 
RT 



E(i,i+2;j-lJ) + a + 2b 
RT 



7-0-3 

]T ZM i+u _ v ZM\ k j_ 2 

k = i + 0 + 5 



ZBBRij = 

0 if/-/<0+3 
QR U (S) + QRjj(W) + QR U (B) + QR u (i) + QR U (M) else 



where 



QR-M = exp ( — ) ZBB i+ ij-2 



0 if/-z<6>+3 
QLijiS) + 0Ly(H) + QLijiB) + QL u (i) + QL U (M) else 

where 



n , , ::(/./• 2./- ?:/ 2./ 

QLij(S) = exp ( — ) ZBB i+2 j- 1 



eA v (H) = exp 



E(f,i + 2;y-lJ) + H(/-i-4,r) 



e^, v (H)=exp 



E(f,/+l;y-2^) + H(/-i-4,r) 



2^,(10)= exp - 



E(/,/+l,/+3;7-3j-2j) + B(l,r)' 



ZBBLj + ij-2- 
j-e-4 
zZ exp 

fc=i+4 

ZBu_ 3 



E(m + 1 ;y - 2 j) + Bjfc - i - 2, r> 
a*/ 



gL, v (lB) = exp 



E(i,i+2,i+4;j-2j-lj) + E(l,T) 



RT 



ZBBL i+2J -i + 
^ ( 

2^ ex P - 



E(m + 2;j- 1 j) + B(k-i-3,T) 
RT 



ZB, 



•kj-2 



e^ i/ (RB) = exp 

ZBBR i+ i j_ 2 + 



E(i,i+l,i + 2:j — 4J — 2J) + B(l,T) 
RT 



( E(i,i+l;j-2J) + H(]-k-3,T) 
}^ exp 1 

k = i+0 + 3 
ZB i+2 ,k 



RT 



ez,, v PB)=exp - 



E(/,i + 2,/ + 3;y - 3,/ - 1 j) + B(l ,T) 



RT 



ZBBR i+2 j-\ + 

£4 ( E(i,i + 2;j-lJ) + B(j-k-2,T) 

2_, ex P 

k=i+e+4 \ 



RT 



fi*y(l) = 

7-0-5 y-4 

E E ex p 

/ = <+3 r=£+0+l 

Z2?« r 



E(i,i + 1 ;,/ - 2 J) + m - i - 2) + (/■ - r - 3 )) 



Z5, 



i+3,£ 
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0*y(M) = exp 



E(i,i+l;j-2J) + a + 2b 
RT 



j-e-4 

J2 ZM i+2 ^ v ZMhj-3 



j-e-3 7-2 
E u (i)= min min + 

«=!+2r=«+0+l 



7-9-2 

E tJ (M) = a+2b+ min (£M f+ i, t _i+Mly. 

fc = l + 3 



Subsection 3.2: Minimum free energy algorithm EBB = 

Assume that a\ , . . . ,a„ is a given RNA sequence. Throughout 
this section, we let £y denote the minimum free energy of J + 00 '0 / < 0 + 2 

a f , . . . ,«/, which is computed and stored in arrays by a dynamic j mrn { Gy(S) + Gy(H) + Gy(B) + Gy(!) + Gy(M)} else 
programming algorithm corresponding to the following recursions. 
Once E\^ n is computed, then the minimum free energy structure where 
can be computed by tracebacks. The following recursions are 

obtained from those in the previous section, by systematically ^ t /... , .. - . ~ . , . . „ DD 

replacing sum by minimum, product by sum and coltzmann 
factor by energy. 

(0 ify'-/<< 



Gy(H) = +Uj-lj) + H(j-i-3,T) 



Eij={ f i- 0 - 1 

|min< Etj- 1 ,EB tJ , mm^ £yt_ i + EB kj ) else 



' + oo 



if j-i<6 



min (c-U—k^ + EBik else 

I. k = i+6+l 



EM U = 

' +00 

|7-»-l 1-0-1 -| 

min< min EMlkj + b + c'(k — i), min b + EMij; +EM\k+\j ( else 



if i <j and j — i<6 



j-e-3 

G // (LB) = min{ min E(i,i+l;j-lj) + B(k-i-2,T) + EB kJ _ 2 , 

k = i+4 

E(i,i + 3-J-2J- I J) + 8(1,7") + £57371,+ w _ , } 



= min{ min E(i,i+ — + 

/: — i + 0 + 3 

B(j-k-2,T)+EB i+2yk , 

1 ,i + 2 ; ; -3J-1 J) + E(1,T) + EBBR i+1J - , } 



£5, 



where 



+ oo 



min{£y(§),£y(H),£'y(B),7iy(BI^)} 



ify-/<( 

else 



7-6-4 ;-3 

Gy(I) = m f= min i E(i,i+ 1 ;;' - 1 j) + 
W-i-2)+(j-r-2))+EB lr 



Eij(S>) = EBBij 



£y(H) = H(/-7-l,7) 



G, v (y ) = E((,(+ hj-\j)+a+2b + 

7-6-3 

t min^(£M i+2 ^_i +£ , Mly_ 2 ) 



£y(LB) = 

f7-6-2 

min<^ min B(/c-/- l,7)+/i5 4 ,_ u R(\,T) + EBBLij 

= i+3 



£7373L, V = 

+ oo ify'-(<0 + 3 

min{ GLy(S) + GL, V (H) + G£y(B) + GL, v (0) + GLy(M)} else 

where 



GL, V (S) = E(/,i +2,/+ 3;y - 2j - 1 j) + EBB i+2J - 1 

y v u»IL»/ — 

7-3 

^=^9+2"" " — .-r-A.-v-.-. vj GLy(H) = E(7',7 + 2;7-l,/-) + H(/'-/-4,r) 



min<{ mm ^ J(/'- ../ ■ I. II . ■../■ / /.'flflyj 
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7-9-3 

GL, 7 (LB) = min{ min E{i,i+2;j-\j) + 

k=i+5 

B(k-i-3,T) + EB kJ _2, 

E(i,i+2j+4;j-2j- \jymi 1 , Ty\-EBBL i+2 j- , } 



GRq(M) = E(i,i + 1 ;j - 2 J) + a + 2b + 

7-0-4 

fc min 4 (£ , M i+2 ,i_i +EMl kJ - 3 ) 



7-4 

GL,,(IRIB) = min{ min E(i,i + 2; /- 1 + 
t=i+e+4 

B(/-*-2,r)+£fl,+3jt, 

E(U+2,i+3-J-3j-\J}m(hTh-EBBR i+ 2j-i} 

j-e-A j-3 
GLij(J)= min min E(f,f+2; ;'— + 

^ = i+4r = < + fl+l 

[((£_/_ 3) + (/_ r _2))+£B Ar 



GL, V (M) = E(m + 2;y - 1,/) + a + 26 + 

7-9-3 

min (£ , M, + 3>/f _i+£ , AfU j/ _2) 



£7373^, v = 

+ oo 

min{ GR,j(§) + GR Q (H) + GR,j{\ 
where 



ify-!<0+3 
-GRij(i) + GRij(M)} else 



GR,j(§) = E(i,i + 1 ,/ + 2-j - 3 j - 2 J) + EBB i+ \j- 



GR,j{U) = E(m + 1 ;j - 2 J) + H(j -i-4, T) 



/-G-4 

GRi ,(LB) = mini min E(i,i +l;j- 2 J) + 

k = i+4 

B(k-i-2,T)+EB kJ -i, 

E(y+l/+3;y -3 J -1 J) +B(l,T}^EBBL i+ w _ 2 } 



7-4 

Gi?,- /(RB) = min{ min E(i,i + l;y'-2j) + 

t = i+0 + 3 

B(/-*-3,r)+£fl, + 2jt, 

Egi+y+2y- 4j- 2j>+4B(l, T) +EBBR i+lj _ 2 } 



Conclusion 

In this paper, we have introduced a new energy model ENN for 
RNA secondary structure prediction and implemented it in a tool 
called RNAenn along with new energy parameters for triplet 
stacking inferred using Brown's algorithm. RNAenn is imple- 
mented in C/C++, without any function calls or dependence on 
other programs, such as mfold [9], RNAfold [18], and RNAs- 
tructure [38]. Recursions from the partition function have been 
cross-checked by setting free energy parameters to zero, in which 
case the program returns the number of secondary structures, 
which can be determined by independent simpler methods. 

It is known from experimental work of Silverman and Cech on 
Tetrahymena group I intron P4— P6 domain [53] that RNA folds 
cooperatively. The melting curves in Figure 5 demonstrate that 
our ENN model leads to somewhat more cooperative folding than 
does the nearest neighbor energy model, in the same manner that 
the melting curves of Figure 1 demonstrate that the nearest 
neighbor energy model leads to more cooperative folding than the 
simple Nussinov energy model. For this reason, we feel that 
RNAenn supports a mathematical model that better reflects the 
experimental data concerning cooperativity of RNA folding. 

From the benchmarking comparison in Table 1, it is clear that 
triplet stacking free energy parameters need further refinement to 
produce better agreement with RNA secondary structures, as 
determined by comparative sequence alignment or X-ray struc- 
ture. This situation is not unlike the situation with nearest 
neighbor software mfold, RNAfold, which over the years 
underwent a series of refinements, with the introduction of 
additional energy parameters (energy parameters for particular 
triloops, tetraloops, bulges of size one, etc.). At the present time, 
software such as Unafold, RNAfold, RNAstructure remain state- 
of-the-art for RNA secondary structure prediction. However, in 
future work, we plan to optimize the triplet stacking energy 
parameters, by using knowledge-base potential as in the work 
[16,17] for the nearest neighbor model. 
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